This displays the resulting filled images calculated using the fill_gaps.R script.
Different parameters were tested on the following data (note there are 2 different weeks, one with good weekly coverage and one without):
Region: Northwest Atlantic (NWA, 39 to 82 N, 42 to 95 W)
Sensor: MODIS
Resolution: 4km
Processing level: Level 3, binned (L3b)
Year: 2015
Weeks: 9, 22
Pixels outside 0-64 mg m^-3 removed
Days with < 5% coverage removed
ImputeEOF removes randomly sampled valid pixels for cross-validation. The number of pixels used is the maximum of 30, or 10% of the pixels. The function continues adding EOFs and calculating the resulting RMSE between real and reconstructed cross-validation pixels until the difference between the current RMSE and RMSE of the previous iteration is below a certain threshold (i.e. adding the most recent EOF did not significantly improve the RMSE). The threshold, called the “tolerance”, is different depending on whether you’re filling data in linear space or in log space, since a log RMSE will be only a fraction of the size of a linear RMSE:
Tolerance for filling logged data: 0.001
Tolerance for filling linear data: 0.01
We start by using a year of data to fill the gaps, and compare different methods below. Then, using the best options, we’ll try using a longer time series.
For each method of filling gaps, we’ll examine the following:
The linear regression uses the standard major axis method (SMA) from lmodel2::lmodel2(), since it minimizes the area of the triangle instead of the distance in the x or y direction alone (i.e. it assumes there is error in both the independent and dependent variables, the “real” and filled/reconstructed data).
Also note that for the tests that involve filling an 8day composite, in situ matchups should be interpreted with caution because of the long temporal bin and the changes that could occur in concentrations and patterns within that time span.
An analysis of DINEOF on the Canadian Pacific coast:
Hilborn A, Costa M. Applications of DINEOF to Satellite-Derived Chlorophyll-a from a Productive Coastal Region. Remote Sensing. 2018; 10(9):1449. https://doi.org/10.3390/rs10091449
Chla algorithm: OCx
Logged/linear data: Logged
Which is better - filling the gaps in 8day data, or filling gaps in daily data and then averaging it into an 8day image?
Although some R^2 metrics are higher for the daily filled version, and the RMSE for the total series and the week with good percent coverage are slightly lower, overall the 8day cross-validation data has a better fit and less bias (e.g. it identifies some patterns of higher concentration better than the daily fill), and gives a better reconstruction for weeks with poor percent coverage.
Number of EOF: 5
Total RMSE: 0.2224261
Week 9 RMSE: 0.2039704
Week 22 RMSE: 0.1992027
Number of EOF: 13
Total RMSE: 0.2024255
Week 9 RMSE: 0.2299895
Week 22 RMSE: 0.1829266
Temporal binning: 8day
Logged/linear data: Logged
Should the OCx or POLY4 algorithm be used? Note that POLY4 has shown to remove some of the bias in the NWA.
OCx = global band-ratio
POLY4 = regional band-ratio, tuned to NWA
Although the POLY4 algorithm increases the RMSE, it also appears to remove some of the bias and provide a tighter fit around the 1:1 line of the CV regression, as well as improving the fit with the in situ matchups. POLY4 was tuned to remove the bias in the NWA that was present when using the OCx algorithm, creating a steeper gradient in chla concentration, which might explain the increase in RMSE as the higher range of chla could be harder to reconstruct.
Number of EOF: 5
Total RMSE: 0.2224261
Week 9 RMSE: 0.2039704
Week 22 RMSE: 0.1992027
Number of EOF: 6
Total RMSE: 0.257584
Week 9 RMSE: 0.2671343
Week 22 RMSE: 0.2588712
Temporal binning: 8day
Chla algorithm: POLY4
Should we use logged data or linear data to fill the gaps?
Note the process for the log option:
(Note that the RMSE is smaller when fitting logged data since it was calculated in log space)
Logged data gives a smoother fill and better R^2 in the CV regressions as it is not negatively impacted by isolated spikes over relatively low and consistent concentrations.
Number of EOF: 6
Total RMSE: 0.257584
Week 9 RMSE: 0.2671343
Week 22 RMSE: 0.2588712
Number of EOF: 5
Total RMSE: 1.806879
Week 9 RMSE: 0.7463928
Week 22 RMSE: 1.298033
If more satellite images are used in the algorithm, will it improve the results?
Hilborn and Costa (2018) found that pixel reconstruction improved with more data in a smaller region on the Canadian Pacific coast. Up until this point we have only used one year of data to fill the gaps, but here we’ll try adding more (an equal number of years on either side of the target year, 2015).
Note that the 3year/5year DINEOF runs use the same cross-validation pixels for 2015 with extra randomly-selected pixels from the remaining years. Also, the CV regression below is performed using only the CV pixels for 2015 to give a more accurate comparison between methods.
Overall, expanding the time series gives a slight improvement to the results, most notably when using 3 years instead of a single year. Based on the RMSE summary plot at the bottom, a time series of 7 to 9 years could be used to get the optimal results, but the smaller decrease in RMSE with every added year might not be worth the extra processing time.
Number of EOF: 6
Total RMSE: 0.257584
Week 9 RMSE: 0.2671343
Week 22 RMSE: 0.2588712
Number of EOF: 11
Total RMSE: 0.2314012
Week 9 RMSE: 0.2522208
Week 22 RMSE: 0.2433103
Number of EOF: 13
Total RMSE: 0.2248182
Week 9 RMSE: 0.2504867
Week 22 RMSE: 0.2395364
Number of EOF: 15
Total RMSE: 0.2219548
Week 9 RMSE: 0.2450302
Week 22 RMSE: 0.2332346
Number of EOF: 17
Total RMSE: 0.2217943
Week 9 RMSE: 0.2449905
Week 22 RMSE: 0.2280072
Number of EOFs for 1/3/5/7/9 years: 6/11/13/15/17